Introduction

This tutorial has two major aims: The first one is to show the general workflow of how land cover classifications (or similar tasks) based on satellite data can be performed in R using machine learning algorithms. The second important aim is to show how to assess the area to which a spatial prediction model can be applied (“Area of applicability”, AOA). This is relevant because in spatial predictive mapping, models are often applied to make predictions far beyond sampling locations (i.e. field observarions used to map a variable even on a global scale), where new locations might considerably differ in their environmental properties. However, areas in the predictor space without support of training data are problematic. The model has no knowledge about these environments and predictions for such areas have to be considered highly uncertain.

Prediction task

The example prediction task is to perfom a supervised land cover classification for the Münster in Germany. The dataset to do this includes selected spectral channels of a Sentinel-2 scene as well as derived artificial channels (NDVI as well as the standard deviation of the NDVI in a 5x5 pixel environment). As resposne (reference/ground truth) we use digitized polygons that were created on the basis of expert knowledge.

How to start

For this tutorial we need the raster package for processing of the satellite data as well as the caret package as a wrapper for machine learning (here: randomForest) algorithms. Sf is used for handling of the training data available as vector data (polygons). Mapview is used for spatial visualization of the data. CAST will be used to account for spatial dependencies during model validation as well as for the estimation of the AOA.

#major required packages:
library(stars)
library(caret)
library(mapview)
library(sf)
library(CAST)
#additional required packages:
library(tmap)
library(latticeExtra)
library(doParallel)
library(parallel)
library(Orcs)

Data preparation

Load and explore the data

To start with, let’s load and explore the remote sensing raster data as well as the vector data that include the training sites.

Raster data (predictor variables)

sen_ms <- read_stars("data/Sen_Muenster.grd") %>%
st_set_dimensions("band", c("B02", "B03", "B04", "B08", "B06", "B07", "B11", "NDVI", "NDVI_sd_5")) %>%
  split("band")
sen_ms
## stars object with 2 dimensions and 9 attributes
## attribute(s):
##       B02               B03               B04               B08       
##  Min.   :   37.5   Min.   :  211.5   Min.   :    1.0   Min.   :    1  
##  1st Qu.:  949.0   1st Qu.:  831.5   1st Qu.:  602.5   1st Qu.: 1506  
##  Median : 1036.0   Median :  914.5   Median :  760.0   Median : 1902  
##  Mean   : 1090.9   Mean   :  968.7   Mean   :  836.7   Mean   : 2089  
##  3rd Qu.: 1158.5   3rd Qu.: 1031.0   3rd Qu.:  970.0   3rd Qu.: 2475  
##  Max.   :14582.5   Max.   :15787.5   Max.   :18767.5   Max.   :19929  
##       B06               B07               B11              NDVI         
##  Min.   :  313.2   Min.   :  279.8   Min.   :  110.4   Min.   :-0.9969  
##  1st Qu.: 1457.6   1st Qu.: 1615.9   1st Qu.: 1432.2   1st Qu.: 0.2039  
##  Median : 1691.6   Median : 1908.3   Median : 1598.2   Median : 0.3892  
##  Mean   : 1809.2   Mean   : 2109.1   Mean   : 1641.0   Mean   : 0.3919  
##  3rd Qu.: 2054.8   3rd Qu.: 2383.6   3rd Qu.: 1801.5   3rd Qu.: 0.5695  
##  Max.   :10973.0   Max.   :11645.3   Max.   :13756.9   Max.   : 0.9989  
##    NDVI_sd_5        
##  Min.   :3.971e-05  
##  1st Qu.:2.380e-03  
##  Median :3.924e-03  
##  Mean   :3.910e-03  
##  3rd Qu.:5.238e-03  
##  Max.   :1.574e-02  
## dimension(s):
##   from  to  offset delta                       refsys point values x/y
## x    1 798  400620    10 +proj=utm +zone=32 +ellps...    NA   NULL [x]
## y    1 664 5760200   -10 +proj=utm +zone=32 +ellps...    NA   NULL [y]

The RasterStack contains a subset of the optical data from Sentinel-2 (see band information here: https://en.wikipedia.org/wiki/Sentinel-2) given in scaled reflectances (B02-B11). In addition,the NDVI was calculated and spatial context is included as the standard deviation of the NDVI in a 5x5 pixel environment (NDVI_sd_5). Let’s plot the rasterStack to get an idea how the variables look like; the following plot histogram (quantile) stretches all bands:

plot(merge(sen_ms), join_zlim = FALSE)

Vector data (Response variable)

The vector file is read as sf object. It contains the training sites of 7 Land cover classes. These are polygons (33 in total) that were digitized in QGIS on the basis of the Sentinel data and with support of an aerial image and using expert knowledge. They can be ragarded here as a ground truth for the land cover classification.

trainSites <- read_sf("data/trainingsites_muenster.gpkg")
print(trainSites)
## Simple feature collection with 33 features and 3 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 400682 ymin: 5753925 xmax: 408327.9 ymax: 5760113
## projected CRS:  WGS 84 / UTM zone 32N
## # A tibble: 33 x 4
##    ClassID Label     PolygonID                                              geom
##      <dbl> <chr>         <int>                                     <POLYGON [m]>
##  1      11 Planted …         1 ((402789.2 5759326, 402850.9 5759266, 402901 575…
##  2      11 Planted …         2 ((403154.2 5759459, 403357.6 5759470, 403373 575…
##  3      11 Planted …         3 ((403124.5 5759788, 403314.6 5759806, 403397.5 5…
##  4      14 Inland w…         4 ((404809.3 5757129, 404865.6 5757163, 404865.6 5…
##  5      14 Inland w…         5 ((403729 5756102, 403797.5 5756077, 403770.6 575…
##  6      14 Inland w…         6 ((408139 5759018, 408201.5 5759011, 408189.7 575…
##  7      14 Inland w…         7 ((406992.5 5756470, 407110.9 5756441, 407184.3 5…
##  8      14 Inland w…         8 ((406334.9 5755042, 406351.1 5755068, 406382.8 5…
##  9       4 Grassland         9 ((403302.3 5755592, 403332.2 5755597, 403337.9 5…
## 10       4 Grassland        10 ((405602.6 5760078, 405610.9 5760104, 405628.3 5…
## # … with 23 more rows

Using mapview’s viewRGB function we can visualize the aerial image channels as true color composite in the geographical context and overlay it with the polygons. Click on the polygons to see which land cover class is assigned to a respective polygon.

viewRGB(as(sen_ms, "Raster"), r = 3, g = 2, b = 1, map.types = "Esri.WorldImagery") + mapview(trainSites)

Extract pixel information

In order to train a machine learning model between the spectral properties and the land cover class, we first need to create a data frame that contains the predictor variables at the location of the training sites as well as the corresponding class information. This data frame can be produced with the extract function. The resulting data frame contains the predictor variables for each pixel overlayed by the polygons. This data frame then still needs to be merged with the information on the land cover class from the sf object.

st_crs(sen_ms) = st_crs(trainSites) # shouldn't be needed!
sen_ms$PolygonID = st_rasterize(trainSites["PolygonID"], sen_ms[1] * NA)
extr <- sen_ms %>%
  as.data.frame() %>%
  na.omit() %>%
  merge(trainSites)
head(extr)
##   PolygonID      x       y   B02 B03 B04    B08      B06      B07      B11
## 1         1 402775 5759315 921.5 877 469 4936.0 3456.156 4704.219 1699.156
## 2         1 402785 5759315 925.0 880 486 4918.0 3482.719 4717.906 1696.969
## 3         1 402795 5759315 918.0 878 491 4907.0 3405.031 4582.531 1722.375
## 4         1 402765 5759305 921.0 870 473 4924.0 3570.250 4891.750 1668.125
## 5         1 402775 5759305 908.0 872 454 4899.5 3584.062 4919.188 1657.312
## 6         1 402785 5759305 896.0 856 466 4895.5 3579.688 4897.812 1649.438
##        NDVI   NDVI_sd_5 ClassID         Label                           geom
## 1 0.8264570 0.003605092      11 Planted field POLYGON ((402789.2 5759326,...
## 2 0.8201332 0.003988799      11 Planted field POLYGON ((402789.2 5759326,...
## 3 0.8180808 0.006624836      11 Planted field POLYGON ((402789.2 5759326,...
## 4 0.8247175 0.002481608      11 Planted field POLYGON ((402789.2 5759326,...
## 5 0.8303913 0.001217456      11 Planted field POLYGON ((402789.2 5759326,...
## 6 0.8261681 0.001640807      11 Planted field POLYGON ((402789.2 5759326,...

In order to speed things up, for this tutorial we will reduce the data. Therefore, from each training polygon only 5% of the pixels will be used for model training. Therefore, from each polygon 5% of the pixels are randomly drawn.

set.seed(100)
trainids <- createDataPartition(extr$PolygonID,list=FALSE,p=0.05)
trainDat <- extr[trainids,]

Model training

Predictors and response

For model training we need to define the predictor and response variables. As predictors we can use basically all information from the raster stack as we might assume they could all be meaningful for the differentiation between the land cover classes. As response variable we use the “Label” column of the data frame.

predictors <- names(sen_ms)[1:9]
response <- "Label"

Model training

We then train a Random Forest model to learn how the classes can be distinguished based on the predictors (note: other algorithms would work as well. See https://topepo.github.io/caret/available-models.html for a list of algorithms available in caret). Caret’s train function is doing this job. Before starting model trainign we can specify some control settings using trainControl. For hyperparameter tuning (mtry) as well as for error assessment we use a spatial 3-fold cross-validation. Therefore the training data are split into 3 folds but data from the same polygon are always grouped so that they never occur in both, training and testing. Also we make sure that each fold contains data from each land cover class. CAST’s CreateSpacetimeFolds is doing this job when we specify the polygon ID and the class label.

indices <- CreateSpacetimeFolds(trainDat, spacevar = "PolygonID", k = 3, class = "Label")
ctrl <- trainControl(method="cv", 
                     index = indices$index,
                     savePredictions = TRUE)

Model training is then performed using caret’s train function. However we use a wrapper around it that is selecting the predictor variables which are relevant for making predictions on new spatial locations. (forward feature selection, fss) We specify “rf” as method, indicating that a Random Forest is applied. For model training we reduce the number of trees (ntree) to 75 to speed things up. Note that usually a larger number (>250) is appropriate. We use the Kappa index for validation.

# train the model
set.seed(100)
model <- ffs(trainDat[,predictors],
               trainDat[,response],
               method="rf",
               metric="Kappa",
               trControl=ctrl,
               importance=TRUE,
               ntree=75)
print(model)
## Random Forest 
## 
## 571 samples
##   4 predictor
##   7 classes: 'Fallow field', 'Grassland', 'Industrial', 'Inland water', 'Mixed forest', 'Planted field', 'Urban' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 389, 340, 413 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##   2     0.9499881  0.9368088
##   3     0.9468239  0.9328530
##   4     0.9431609  0.9282247
## 
## Kappa was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.
plot(varImp(model))

Model validation

When we print the model (see above) we get a summary of the prediction performance as the average Kappa and Accuracy of the three spatial folds. Looking at all cross-validated predictions together we can get the “global” model performance.

# get all cross-validated predictions:
cvPredictions <- model$pred[model$pred$mtry==model$bestTune$mtry,]
# calculate cross table:
table(cvPredictions$pred,cvPredictions$obs)
##                
##                 Fallow field Grassland Industrial Inland water Mixed forest
##   Fallow field            67         0          0            0            0
##   Grassland                0        21          0            0            0
##   Industrial               0         0         34            0            0
##   Inland water             0         0          0           76            0
##   Mixed forest             0         0          0            0          120
##   Planted field            0         0          0            0            2
##   Urban                    4         0          2            0            4
##                
##                 Planted field Urban
##   Fallow field              0     6
##   Grassland                 1     0
##   Industrial                0     3
##   Inland water              0     0
##   Mixed forest              1     4
##   Planted field            70     0
##   Urban                     0   156

We see that the performance is very high and that only minor false classifications occur.

Model prediction

To perform the classification we can use the trained model and apply it to each pixel of the raster stack using the predict function. Then we can then create a map with meaningful colors of the predicted land cover using the tmap package.

prediction <- predict(sen_ms, model)
cols <- c("sandybrown", "green", "darkred", "blue", "forestgreen", "lightgreen", "red")

tm_shape(prediction) +
  tm_raster(palette = cols,title = "LUC")+
  tm_scale_bar(bg.color="white",bg.alpha=0.75)+
  tm_layout(legend.bg.color = "white",
            legend.bg.alpha = 0.75)

Area of Applicability

We have seen that technically, the trained model can be applied to the entire area of interest (and beyond…as long as the sentinel predictors are available which they are, even globally). But we should assess if we SHOULD apply our model to the entire area. The model should only be applied to locations that feature predictor properties that are comparable to those of the training data. If dissimilarity to the training data is larger than the disimmilarity within the training data, the model should not be applied to this location.

The calculation of the AOA is quite time consuming. To make a bit faster we use a parallelization.

cl <- makeCluster(4)
registerDoParallel(cl)
AOA <- aoa(sen_ms, model, cl = cl)
plot(AOA[1], breaks = "quantile")

plot(AOA[2])

The result of the aoa function has two raster layers: the dissimilarity index (DI) and the area of applicability (AOA). DI takes values of 0 or larger, where 0 means that a location has predictor values that are identical to values observed in the training data. With increasing values of DI the dissimilarity increases. AOA has two values: 0 or 1; 0 indicating locations outside the area of applicability, 1 indicating the area of applicability. Find more information on how the AOA is derived in Meyer&Pebesma (2020).

The figure above shows the predictions (topleft) as well as the predictions ONLY for the AOA (topright. Locations outside the AOA are shown in grey). For comparison the RGB composite is shown here. We see that the model can be applied to most parts of Münster, however there are some locations (especially in the south-west) that are too different in their predictor properties so that we should exclude those predictions from our prediction result.

Model transfer

The idea of the AOA is probably getting even more obvious when we use the trained model to make predictions for a completely new area (model transfer). This seems reasonable at the example of land cover mapping because we might assume that land cover classes “look” similar (i.e. have similar spectral properties) in other locations as well. So let’s apply the model to a new area: Marburg (Germany) and use the AOA concept to see if we can apply our model there.

To do this we use a prepared Sentinel-2 dataset from this new location and apply the model trained in Münster to make predictions for Marburg.

sen_mr <- stack("data/Sen_Marburg.grd") %>%
  st_as_stars() %>%
  split("band")
prediction_mr <- predict(sen_mr, model)
AOA <- aoa(sen_mr, model, cl = cl)

We see that while the gernal patterns in the predictions appear to make sense, large parts of the area are outside the applicability of the model. This means that the predictor properties present in Marburg are not sufficiently covered by the training data from the Münster area.

Looking at the color composite this becomes understandable. As an example focusing on the forest area we see that the areas that are outside the AOA appear considerably darker in the color composite. If we had a look into these areas in more detail we would see that these areas are coniferous forest, a land cover class frequently found around Marburg but not present in Münster.

Learn from more locations

So apparently transfering models is problematic if we cannot guarantee that all land cover classes are covered. There are other problems that come on top. To mention just a few, atmospheric conditions of the time of image acquisition, shifts in the season, as well as biased training sites could limit the ability of a trained model to make predictions beyond the conditions of training.

It might therefore be a good idea that, if a model is intended for transfer, that we train from different areas and conditions.

Let’s have a look how this can change our prediction results. The following datasets includes training data sampled in 15 different areas in Germany, each using a Sentinel-2 image from different summer dates as predictors. Each location was sampled by a different person so that training sites are expected to be unbiased. Note that Marburg (and the area around it) was not part of this survey!

multiloc <- readRDS("data/trainingdata_allareas.rds")
set.seed(100)
trainids <- createDataPartition(multiloc$uniquepoly,list=FALSE,p=0.005)
trainDat <- multiloc[trainids,]
trainDat <- trainDat[complete.cases(trainDat),]
head(trainDat)
##     ID   B02 B03 B04    B08      B06      B07      B11      NDVI    NDVI_sd_5
## 99  29 871.0 707 468 3159.0 2107.375 3164.812 1004.531 0.7419355 0.0004277991
## 132 40 902.0 772 506 3818.0 2642.750 3997.062 1566.625 0.7659575 0.0010237339
## 193 25 905.5 848 449 4826.5 3550.844 4840.969 1614.875 0.8297791 0.0010365382
## 551 28 903.0 830 453 4085.0 3000.250 4042.125 1358.938 0.8003526 0.0009923165
## 555 40 964.0 850 613 3545.0 2522.188 3579.000 1768.062 0.7051467 0.0015287382
## 564 21 926.0 951 513 4846.0 3872.438 5130.812 1425.125 0.8085464 0.0001293210
##                    Area             uniquepoly         Label
## 99             Muenster            Muenster_29 Planted field
## 132      Nordvorpommern      Nordvorpommern_40 Planted field
## 193            Muenster            Muenster_25 Planted field
## 551            Muenster            Muenster_28 Planted field
## 555      Nordvorpommern      Nordvorpommern_40 Planted field
## 564 Warburg_Diemelstadt Warburg_Diemelstadt_21 Planted field
unique(trainDat$Label)
##  [1] "Planted field"     "Fallow field"      "Inland water"     
##  [4] "Wetland"           "Shallow water"     "Greenhouses"      
##  [7] "Grassland"         "Heathland"         "Industrial"       
## [10] "Deciduous forest"  "Sea"               "Mixed forest"     
## [13] "Bog"               "Coniferous forest" "Open soil"        
## [16] "Sand"              "Reeds"             "Urban"

Note that more Land cover classes occur in this larger dataset as a result of a higher variability found when we look into more than just a single location (most of them however won’t occur in Marburg and Münster).

We will next use this dataset to train a new model, make a prediction and estimate the AOA using the same approach as explained above.

set.seed(100)
indices <- CreateSpacetimeFolds(trainDat, spacevar = "uniquepoly", k = 3, class = "Label")
ctrl <- trainControl(method="cv", 
                     index = indices$index,
                     savePredictions = TRUE)
set.seed(100)
model <- ffs(trainDat[,predictors],
               trainDat[,response],
               method="rf",
               metric="Kappa",
               trControl=ctrl,
               importance=TRUE,
               ntree=75)

prediction_mr <- predict(sen_mr, model)
AOA <- aoa(sen_mr, model, cl = cl)
print(model)
## Random Forest 
## 
## 1954 samples
##    6 predictor
##   18 classes: 'Bog', 'Coniferous forest', 'Deciduous forest', 'Fallow field', 'Grassland', 'Greenhouses', 'Heathland', 'Industrial', 'Inland water', 'Mixed forest', 'Open soil', 'Planted field', 'Reeds', 'Sand', 'Sea', 'Shallow water', 'Urban', 'Wetland' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1303, 1355, 1250 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##   2     0.6601567  0.6258959
##   4     0.6704455  0.6377077
##   6     0.6446251  0.6092974
## 
## Kappa was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 4.

Note that the Accuracy and Kappa are lower which is not surprising because more (challenging) classes are to be predicted.

As we can see, now the predictions for Marburg look considerably better, the delineation of forest types looks reasonable and also other classes like the river could be detected. There are only very few locations left that are outside the AOA.

Since the model trained across several locations is successfull to make predictions for the new unseen area of Marburg, it might be intertesting to see if this model can also make better predictions for Münster.

prediction <- predict(sen_ms,model)
AOA <- aoa(sen_ms, model, cl = cl)

We can see that if we use a model that is trained on different areas and under different conditions we can improve our prediction and can now make predictions for entire Münster. No areas are outside the AOA anymore.

Summary

  • This tutorial has shown how to perform a remote sensing based land cover classification in R.
  • We identified the area of applicability (AOA) of the trained model to make sure that we don’t make predictions for locations that model has no knowledge about.
  • We transfered the model to a new area and concluded that a transfer is only possible when the model has knowledge about the new environment. Again, the AOA method was applied to identify the unknown locations.
  • We have seen that transferability can be improved when a model is trained on more hereogeneous data.
  • Communicating the AOA is important to avoid mis-planning when predictive mapping is used as a tool for decision making (e.g. in the context of nature conservation), as well as to avoid propagation of massive errors when spatial predictions are used as input for subsequent modelling.
  • Note that the presented dissimilarity index can also be used in the context of guided training data sampling with the aim to increase the AOA of a prediction model

======= ### Get further help For further help on handling of raster and vector data in R see e.g. https://geocompr.github.io/. More information on the relevance of spatial validation strategies can be found in the previous OpenGeoHub recordings (https://www.youtube.com/watch?v=mkHlmYEzsVQ) as well as in e.g. Meyer et al (2019). The methodology to estimate the AOA is described in Meyer&Pebesma (2020)